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Introduction 

OlchotOBous variables are frequently encountered in Multiple regression 
analysis, both as independent and dependent variables. A dichotonous 
independent variable is used to deteraine whether group MHbership is related 
to or »fill predict a certain outcoae (i.e.. whether gender predicts gpa). A 
dichotOROus dependent variable is used to deteraine a coabination of variables 
that will predict group HeHbership (i.e., to predict dropping out of college). 

Historically, whenever a dichotoaous variable was studied as an 
Independent variable with one dependent variable, a t-test, analysis of 
variance or analysis of covariance was conducted. Nhen a dichotoaous variable 
was studied as a dependent variable, discrininant analysis was used. 

As Multiple regression becaae nore couon, its advocates suggested that 
it could or should replace the t-test, ANOVA, ANCOVA or discriainant analysis 
in dealing with dichotoaous variables by using coded variables. 

Recently, however. Cox (1970), Goodaan. (1978). Aldrich and Nelson 
(1984). and others have questioned the practice of using Multiple regression 
when a dichotoaous variable is used as the dependent variable. The aost 
frequently suggested replaceaent for Multiple regression is logistic 
regression. 

In the introduction to Aldrich and Nelson (1984). it is suggested that 

ordinary regression analysis is not an appropriate strategy to analyze 

qualitative dependent variables, including those that are dichotoMUs. They 

go on to express the liaitations of nultiple regression very strongly: 

Perhaps because of its widespread popularity, 
regression uy be one of the Bost abused statistical 
techniques in the social sciences, while estiaates 
derived froB regression aiialysis aay be robust against 
errors in som assuaptions. other assuaptions are crucial, 
and their faili <e will lead to quite unreasonable 
esti■att^ Such i& the case when the dependent variable 
is a qualitative Measure rather than a continuous, 
Q interval Measure. ... For exaaple we shall show that 
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regression estimates with a qualitative dependent variable 
■ay seriously nisestiaate the Magnitude of the effects of 
independent variables, [and] that all of the standard 
statistical inferences such as hypothesis tests . . . are 
unjustified (p. 9. 10). 

The authors suggest that the failure of regression is "particularly 
troubling in the behavioral sciences" (p. lO), giving examples of qualitative 
dicfaotomous variables from the fields of political science, economics and 
sociology. Similar criticisms concerning dichotomous dependent variables are 
given strong emphasis in multiple regression textbooks aimed at economics and 
sociology, but popular regression textbooks in the behavioral sciences related 
to psychology and education do not express this same concern. For example, 
neither Cohen ft Cohen (1975) nor Pedhazur (1982) deal with weighted least 
squares or logistic regression, two methods mentioned by multiple regression 
critics as preferable with dichotomous dependent variables. Both texts state 
that multiple regression can be used for and is mathematically equivalent to 
discriminant analysis when the dependent variable is a dichotomy (Cohen & 
Cohen, p. 442; Pedhazur. p. 887). but neither gives an indication that there 
are criticisms of this use. Tatsuoka (1971) states that in the dichotomous 
dependent variable case, multiple regression, discriminant analysis and 
canonicRl corralation are all mathematically equivalent and again, no 
indication is given of any criticisms of this approach. 

Neter et ml.. (1983) list three problems that arise when the dependent 
variable is dichotomous: 1) non-normal error terms. 2) non-constant error 
variance, and 3) constraints on the response function. They state that even 
with binary vlapendent variables, ordinary least squares still provides 
unbiased estimators under quite general conditions, and "when the sample size 
la large, inferences concerning the regression coefficients and mean responses 
can be made in the same fashion as when the error terms are assumed to be 
normally distributed" (p. 357). They add. however, that these estimators will 
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not be efficient, giving larger variances than could be obtained with weighted 
procedures . 

The solutions proposed to these probleMS include using weighted least 
squares to give constant error variance and using a transforution (such as 
logistic) that liMits the response function to a range of 0 to l. 

In conparing the use of logistic regression or discrininant analysis with 
dichotoMous dependent variables. Press and Nilson (1978) suggest that logistic 
regression is preferred except when the populations are norul with identical 
covariance Mtrires. They extend the criticisns of others to Include 
situations in which dichotonous variables are used as independent variab?.es. 
They state that logistic regression is valid for a wide variety of underlying 
assumptions including l) all explanatory variables are Multivariate nornally 
distributed with equal covariance Matrices, 2) all explanatory variables are 
independent and dichotonous, and 3) som are Multivariate nornal and some 
dichotoMous whereas discrininant analysis is only valid under the first set of 
assumptions. These coments are not directed at multiple regression, but 
would apply in those situations where it is matheaatically equivalent to 
discriminant analysis. Their conclusion is that logistic regression with 
maximum liklihood estimation is preferred to linear discriminant analysis. 
They state, however, that it is unlikely that the two methods will give 
markedly different results or yield substantially different linear functions 
unless there is a large proportion of observations whose x-values lie* in 
regions of the factor space with linear logistic response probabilities near 
zero or one. They go on to say that logistic regression is preferred 
when the normality assumptions are violated, especially when many of the 
independent variables are qualitative. 

The critics state that in addition to the predictions made by the 
regression equation with a dichotomous dependent variable, statistical tests 
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are also invalid. This would include the P test of the overall Model and the 
t values for each predictor in the aodel. 

Cox (1970). in referring to the use of Multiple regression with 
dichotoMous dependent variables, states that "the use of a Model, the nature 
of whose liMitations can be foreseen, is not wise, except for very liMited 
purposes" (p. 18). If these critics are correct, it appears as if researchers 
in education and psychology should discontinue the use of Multiple regression 
in these situations. 

ProbleM 

This paper is an atteMpt to assess the Meaning of the charges Made 
against Multiple regression and to suggest what the regression coMMunity in 
education and psychology can do to coms to terMS with critics of Multiple 
regression. The purpose of this paper is not to evaluate the validity of the 
criticisMS but to deal with soMe logical extensions of thCM. If thpse 
criticisMs are valid, are t-tests. analysis of variance, analysis of 
covariance. discriainant analysis, canonical correlation, and any use of dUMMy 
variables in Multiple regression also called into question? 

The questions raised by this paper, then, arc: 

1. To what extent do these criticisMs affect the validity of other 
coMparable statistical procedures? 

2. If other statistics) procedures using different assuHptions give 
identical results to Multiple regression using dichotoMous dependent 
viiriables. does this iaply suspicion concerning the other procedures 
or suspicion concerning the validity of the crlticisMs or both? 

Procedures and Findings 
To exaaine the validity and/or seriousness of these criticisas. 
iMplicatlons oi this situation are considered by exaaining a set of data taken 
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froM the A3 data set in Gunst » Mason (1980). This data set has 13 yearly 
observations with 14 variables. The year variable was dichotonized by letting 
the first 7 years be in one group and the last 6 years be the other group. The 
data is analyzed in 5 different cases with different arrangements of the 
dlchotOBous variable with one or two quantitative variables from this data 
set. The dichotOBous variable Is considered as both a dependent variable and 
an independent variable. 

In Table l different, combinations of quantitative and dichotomous 
independent and dependent variables where multiple regression has been used 
are presented with a listing of conventional tUternative statistical methods 
and methods recommended by multiple regression critics. The critics suggest 
that in cases where a dichotomous dependent variable is used (cases 1 and 3) 
multiple regression is inappropriate. The approach taken in this paper is to 
compare the results of multiple regression in these cases with results of 
cases where multiple regression has not been attacked (cases 2 and 4). 

Table 1 



Possible Statistical Procedures to use with Different 
Combinations of Dichotomous and Quantitative Variables 



Case Dependent Variable Indeoendent Variable 



One Predictor 
1. 1 Dichotomous 



2. 1 Quantitative 



1 Quantitative 



1 Dichotomous 



Possible orocedures 



Logistic regression 
Pearson correlation 
Pt. bis. correlation 

t test 

Pearson correlation 
Pt. bis. correlation 
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Two-*- Predictors 
3» 1 Dichotonous 



4. 1 Quantitative 



Quant i tat ive/0<t> Dichotonous Logistic regression 

Discriminant analysis 
Multiple regression 

Quantitative/U Dichotomous Analysis of Covariance 

Multiple regression 
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Table 2 presents the results of the one predictor cases with the 
dlchotoHous variable as a dependent variable (case 1) and as an Independent 
variable (case 2). In these situations the t value Is the saae whether the 
dlchotOBous variable Is the Independent or dependent variable. A one 
predictor nodel Is the simplest case of Multiple regression and the test ot 
significance of the relationship Is MtheMtlcally Identical to an independent 
■eans t-test and a one-way ANOVA with two groups and the regression test of 
significance (t value) Is the saae whether the dlchotoaous variable Is the 
Independent or dependent variable, if a test of significance with a 
dlchotoHous dependent variable Is Invalid, then all t^sts of significance for 
an Independent neans t-test, a two-group one-way ANOVA and 
correlation/regression with an Independent dlchotoaous variable are also 
Invalid. 

Table 2 
One Predictor Exanples 

CASE 1: Multiple regression clalaed to be Invalid 

Dependent variable <■ 2 (Dlchotonous) 
Independent variable - 3 (Quantitative) 

t3 ■ -6.910 — ssae as case 2 

CASE 2: Multiple regresslor Is valid 

Dependent variable 3 (Quantitative) 
Independent varlt « 2 (Dlchotoaous) 

t2 ■ -6.910 — sa T case 1 
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Table 3 presents the results of the two predictor cases with the 
dichotoaous variable a^ a dependent variable (case 3) and as an independent 
variable (cases 4a and 4b). Case 3 is a situation where multiple regressioi 
and dlscrlMinant analysis are both frequently used bu^ is considered to be 
Invalid by the critics of ordinary least squares due to the presence of a 
dichotoaous dependent variable. The t values in case 3 are testing the 
significance of the relationship of each quantitative predictor with the 



Table 3 



Two Predictor Exaaples 

CASE 3: Multiple regression claiaed to be invalid 

Dependent Variable - 2 (Dichotoaous) 
Independent Variables - 4 (Quantitative) 

- 3 (Quantitative) 

t4 - -0.124 — saae as case 4a 
t3 - -8.480 — saae as case 4b 



CASE 4: Multiple regression la valid 

a. Dependent Variable - 4 (Quantitative) 
Independent Variables - 2 (Dichotoaous) 

- 3 (Quantitative) 

tg - -0.124 — saae as case 3 

ta - -0.397 

b. Dependent Variable - 3 (Quantitative) 
Independent Variables - 4 (Quantitative) 

- 2 (Dichotoaous) 

t4 - -0.397 

t2 - -8.480 — saae as case 3 

dichotoaous dependent variable controlled for the other quantitative 
predictor. Cases 4a and 4b give identical t values to those found in case 3 
for the relationship between the dichotoaous variable (which is now one of the 
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independent variables and in a legitiaate place according to assunptions of 
Multiple regression) and the dependent quantitative variable, if the tests 
for which the t values in Case 3 are invalid, then the tests for which the t 
values in cases 4a and 4b are used are also invalid. The t values in cases 4a 
and 4b are the same as the square root of the P values that would be coaputed 
with a one-way analysis of covariance in which the independent quantitative 
variable was treated as the covariate and the independent dichotoaous variable 
as the grouping variable. So therefore if Case 3 is invalid, then all one-way 
ANCOVA designs and any use of dumny variables in multiple regression would be 
invalid also. 



Conclusion and Recoraendations 
It is clear froa the above exaaples that the tests of significance are 
identical whether the dicbotoHous variable is an independent variable or a 
dependent variable. It appears, therefore, that if the critics of using 
nultiplo regression with a dichotonous dependent variable are to be taken 
seriously, they Bust also deal with all significance testing with t tests, 
analysis of variance, analysis of covariance, discrininant analysis, and any 
use of duray variables in Multiple regression. There nay be other statistics 
reported in a Multiple regression analysis, such as the standard error of 
estiaate or predicted values for which the interpretations aay not be 
appropriate when diehotoaous dependent variables are used, but this paper will 
not deal with thesa issues. 
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